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DETAILED ACTION 
Specification 

The disclosure is objected to because of the following informalities: 

a) Throughout the disclosure, such as page 2, line 7, errors of the type 
"Abank@" should be -"a bank"-. 

b) On page 3, line 14, "word s" should be -words-. 

c) On page 5, line 8, "query 22" should be -query 20--. 

d) Throughout the disclosure, such as page 8, line 2, errors of the type 
"Bayes=s" should be -Bayes's-. 

e) On page 12, line 7, "5,000documents" should be -5,000 documents-. 
Appropriate correction is required. 



Claim Rejections - 35 USC § 102 

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(b) the invention was patented or described in a printed publication in this or a foreign country or in public 
use or on sale in this country, more than one year prior to the date of application for patent in the United 
States. 

1. Claims 1,4, 13-15, 17, and 26 are rejected under 35 U.S.C. 102(b) as being 
anticipated by Caid et al. (U.S. Patent 5,619,709). 

In regard to claim 1, Caid et al. discloses a computer implemented method 
(column 4, lines 45-65) for retrieving documents, comprising: 
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Inputting the text of one or more documents wherein each document includes 
human readable words (from training text, Fig. 1B, 101, column 6, lines 26-27); 

Creating context windows around each word (target word, Figs. 2A-2F, 202) in 
each document (column 6, lines 27-52); 

Generating a statistical evaluation of the characteristics of all the windows (by 
creating context vectors, column 6, line 53 - column 8, line 67). The statistical 
evaluation of the characteristics of the windows is not a function of the order of 
appearance of the words within each window (influence of a neighbor word to a target 
word is dependent on how close in position the neighbor is to the target, but not on their 
order of appearance, column 6, lines 53-62); 

And combining the statistical evaluation for each window (context vectors are 
combined to create a document vector, column 9, lines 1-22). 

2. In regard to claim 4, Caid et al. discloses that pluralities of document categories 
are defined (bucket in a cluster tree) and the category of a particular document is based 
on the statistical evaluation for each window. 

The statistical evaluation of each window creates a context vector. The context 
vectors are categorized by finding the bucket in a cluster tree that has the closest 
centroid to that particular document (column 10, lines 58-67 and column 1 1 , lines 1-4). 

3. In regard to claim 13, Caid et al. discloses the step of creating context windows 
around each word further comprises the step of selecting the words appearing before 



Application/Control Number: 09/851 ,675 Page 4 

Art Unit: 2655 

and after each word by a predetermined amount in the document and including those 
selected words in the window (Figs. 2A-2F and column 6, lines 33-36). 

4. In regard to claim 14, Caid et al. discloses the word around which each window 
created is not included in the window. Caid et al. discloses a window (Fig. 2A-2F, 204) 
is created around a target word (target 202). The window includes three neighbors 
(203) on each side of the word in around which each window is created (target 202). In 
reference to Figure 2A, Caid et al. discloses that the window (204) "only includes the 
three neighbor stems 203", and in reference to Figure 2F, "window 204 includes three 
neighbors 203 on each side of target 202", not the target 202. 

5. In regard to claim 15, Caid et al. discloses normalizing the combined results of 
the statistical evaluation for the windows (Fig. 4, 403 and column 9, lines 21-22). 

6. In regard to claim 17, Caid et al. discloses the step of combining includes 
averaging probability assessments (context vectors are combined by a weighted sum, 
fig. 4, 401 and 402, column 9, lines 11-15). 

7. In regard to claim 26, Caid et al. discloses a computer program storage device 
(Fig. 1 , 112) and computer readable instructions on the storage device for causing a 
computer to undertake method acts to facilitate retrieving documents (column 4, lines 
59-65), the method acts comprising: 
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Inputting the text of one or more documents wherein each document includes 
human readable words (from training text, Fig. 1B, 101, column 6, lines 26-27); 

Creating context windows around each word (target word, Figs. 2A-2F, 202) in 
each document (column 6, lines 27-52); 

Generating a statistical evaluation of the characteristics of all the windows (by 
creating context vectors, column 6, line 53 - column 8, line 67). The statistical 
evaluations of the characteristics of the windows are not a function of the order of 
appearance of the words within each window (influence of a neighbor word to a target 
word is dependent on how close in position the neighbor is to the target, but not the 
order of appearance, column 6, lines 53-62); 

And combining the statistical evaluation for each window (context vectors are 
combined to create a document vector, column 9, lines 1-22). 



Claim Rejections - 35 USC § 103 

The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

8. Claim 5 is rejected under 35 U.S.C. 103(a) as being unpatentable over Caid et al. 

In regard to claim 5, Caid et al. does not explicitly disclose determining the center 
of a particular window based on the combined statistical evaluation for each window. 



Application/Control Number: 09/851 ,675 Page 6 

Art Unit: 2655 

Caid et al. does disclose that windows are created around a center word (Fig. 2F, 
column 6, lines 48-50). Furthermore, the statistical evaluation of the window (creation 
of a context vector) is performed to determine the context of the center word (column 6, 
lines 53-62). 

Therefore, Caid et al. would strongly suggest to one of ordinary skill in the art at 
the time of invention that the center of a particular window could be determined based 
on the statistical evaluation for each window. 

9. Claims 2, 3, 6-12, 18-25, and 27 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Caid et al., in view of Hill (U.S. Patent 5,713,016). 

In regard to claim 2, Caid et al. does not disclose determining the likelihood of 
documents having predetermined characteristics based on the statistical evaluation for 
each window. 

Hill discloses a method of determining the relevance of a document based on the 
feature vector of a document. The method will determine the likelihood (log likelihood 
ratio) of a document (first document taken from a database of text documents) having a 
predetermined characteristic (query to be compared to first document, column 6, lines 
1-5 and lines 11-28). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Caid et al. so document vectors based on the combined statistical 
evaluation of each window, as taught by Caid et al., would be used to determine the 
likelihood of those documents having predetermined characteristics, as taught by Hill, 
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so that the measure of relevance between two documents would not require any 
manual relevance feedback, as taught by Hill (column 5, lines 13-16). 

In regard to claim 3, Caid et al. discloses assigning a document identifier to each 
document (document summary vector, Fig. 4, 404, column 9, lines 21-22) and context 
window (context vectors 106). 

Caid et al. does not disclose determining the document identifier of at least one 
document having predetermined characteristics. 

Hill discloses determining whether a document has predetermined characteristics 
(column 6, lines 1-5 and lines 1 1-28). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Caid et al. to determine whether a document has predetermined 
characteristics, as disclosed by Hill, in order to determine an appropriate measure of the 
relevance between the document and the predetermined characteristic, as taught by Hill 
(column 5, lines 13-16). 

10. In regard to claim 6, Caid et al. does not disclose counting the occurrences of 
particular words and tabulating totals of the counts. 

Hill discloses that a statistical evaluation of a document is based on counting the 
occurrences of particular words and particular documents and tabulating totals of the 
counts (yj is defined as the total number of times a word j occurs in a particular 
document and is used to define a context vector y, column 6, lines 13-14). 
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It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Caid et al. so the statistical evaluation used to create context vectors 
counted the occurrences of particular words and particular documents and tabulating 
totals of the counts, as taught by Hill, in order to provide a fast, automatic, and 
consistent statistically based method of comparing two documents, as taught by Hill 
(column 5, lines 5-7). 

1 1 . In regard to claim 7, Caid et al. discloses that a statistical evaluation includes the 
step of generating word counts about pair-wise occurrences. A frequency function 
(Equation 3) includes a parameter F y that is a count of the total number of pair-wise 
occurrences (the total number of occurrences of a neighboring word (stem) in a plurality 
of documents (corpus), column 6, lines 57-62 and column 7, lines 36-37). 

Caid et al. does not disclose that a statistical evaluation includes the step of 
generating counts about singular word occurrences. 

Hill discloses that a statistical evaluation of a document is based on counting the 
occurrences of particular words and particular documents and tabulating totals of the 
counts (yj is defined as the total number of times a word j occurs in a particular 
document and is used to define a context vector y, column 6, lines 13-14). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Caid et al. so the statistical evaluation used to create context vectors 
counted the occurrences of particular words and particular documents and tabulating 
totals of the counts, as taught by Hill, in order to provide a fast, automatic, and 
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consistent statistically based method of comparing two documents, as taught by Hill 
(column 5, lines 5-7), and further refined that statistical evaluation by generating word 
counts of pair-wise occurrences, as taught by Caid et al. 

12. In regard to claim 8, Caid et al. discloses pruning the number of pair-wise counts. 
Caid et al. discloses that when generating context vectors to perform a statistical 
evaluation, words that have pair-wise occurrences (stems that appear several times in 
the corpus) of more than a certain amount, will not be counted (their context vectors will 
be approximated, rather than computed directly, column 22, lines 47-65). 

13. In regard to claim 9, Caid et al. discloses that pruning the number of pair-wise 
counts is much more efficient and provides "significant improvements in retrieval and 
routing performance" (column 23, ins 18-23). Furthermore, Caid et al. discloses that, 
while the creation of approximate context vectors reduces the amount of memory 
needed, the approximation introduces errors (column 19, lines 25-37). 

Caid et al. does not specifically disclose monitoring the amount of memory used 
for the pair-wise counts and pruning when a predetermined threshold of memory has 
been exceeded for the pair-wise counts. 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Caid et al. to monitor the amount of memory being used for the pair- 
wise counts so that while there was enough memory available, the pair-wise counts 
would not be pruned, so the context vectors would be more accurate, and to prune the 
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number of pair-wise counts and make an approximation if a threshold of memory was 
exceeded, in order to reduce the amount of memory used. 

14. In regard to claim 10, Caid et al. does not disclose that the statistical evaluation 
includes the step of determining probabilities of particular words appearing in particular 
documents based on the counts. 

Hill discloses that the probability (log likelihood) of a particular word appearing in 
a particular document is based on counts of particular words and particular documents 
(column 6, lines 11-28). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Caid et al. so the statistical evaluation determined the probability a 
particular word appeared in a particular document, as taught by Hill, so that an 
appropriate statistical model was used to assess the relevance between two 
documents, as taught by Hill (column 5, lines 8-12). 

1 5. In regard to claim 1 1 , Caid et al. does not disclose that the statistical evaluation 
includes determining conditional probabilities of particular words appearing in particular 
documents based on the counts. 

Hill discloses that the probability of a particular word appearing in a particular 
document is a conditional probability (log likelihood ratio, < describes the conditional 
probability of y given x, column 6, lines 1 1-28). 
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It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Caid et al. so the statistical evaluation determined the conditional 
probability a particular word appeared in a particular document, as taught by Hill, so that 
a more specific statistical model describing statistical dependence was used to assess 
the relevance between two documents, as taught by Hill (column 5, lines 8-12). 

16. In regard claim 12, Caid et al. does not disclose the step of calculating a 
conditional probability is based on a Simple Bayes statistical model. 

Hill discloses the calculation of the conditional probability is based on a Simple 
Bayes statistical model. In order to calculate the conditional probability (log likelihood 
probability of y given x), hyperparameters must be estimated by a Bayesian process 
(column 7, lines 49-67 and column 8, lines 1-18). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Caid et al. to calculate the conditional probability based on a Simple 
Bayes statistical model, as taught by Hill, so that appropriate statistical model 
parameters were used to assess the relevance between two documents, as taught by 
Hill (column 5, lines 8-12). 

17. In regard to claim 18, Caid et al. discloses a computer system comprising: 

A storage unit for receiving and storing a plurality of documents, wherein each 
document includes human readable words (data storage 109, column 4, lines 55-57); 
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A means for creating context windows around each word in each document 
(learning system 105 creates context vectors by creating a window around each word in 
the document, column 6, lines 6-10 and lines 30-38); 

A means for generating a statistical evaluation of the characteristics of each 
window (by creating context vectors, column 6, line 53 - column 8, line 67), wherein the 
order of the appearance of the words within each window is not used in the statistical 
evaluation (column 6, lines 53-62); and 

A means for combining the statistical evaluation for each window (context vectors 
are combined to create a document vector, column 9, lines 1-22); 

Caid et al. does not disclose a means for determining the probabilities of 
documents having predetermined characteristics based on the combined statistical 
evaluation for each window. 

Hill discloses a means for determining the probabilities of documents having 
predetermined characteristics based on the combined statistical evaluation for each 
window (column 6, lines 1 -5 and lines 1 1 -28). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Caid et al. to include a means for determining the probabilities of 
documents having predetermined characteristics based on the combined statistical 
evaluation for each window, so that the measure of relevance between two documents 
would not require any manual relevance feedback, as taught by Hill (column 5, lines 13- 
16). 
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18. In regard to claim 19, Caid et al. discloses assigning a document identifier to 
each document (document summary vector, Fig. 4, 404, column 9, lines 21-22) and 
context window (context vectors 106). 

Caid et al. does not disclose a means for determining the document identifier of 
at least one document having predetermined characteristics. 

Hill discloses a means for determining whether a document has predetermined 
characteristics (column 6, lines 1-5 and lines 11-28). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Caid et al. to include a means for determining whether a document 
has predetermined characteristics, as disclosed by Hill, in order to determine an 
appropriate measure of the relevance between the document and the predetermined 
characteristic, as taught by Hill (column 5, lines 13-16). 

19. In regard to claim 20, Caid et al. discloses that pluralities of document categories 
are defined (bucket in a cluster tree) and a means for determining the category of a 
particular document is based on the statistical evaluation for each window. 

The statistical evaluation of each window creates a context vector. The context 
vectors are categorized by finding the bucket in a cluster tree that has the closest 
centroid to that particular document (column 10, lines 58-67 and column 11, lines 1-4). 
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20. In regard to claim 21, Caid et al. does not explicitly disclose a means for 
determining the center of a particular window based on the combined statistical 
evaluation for each window. 

Caid et al. does disclose that windows are created around a center word (Fig. 2F, 
column 6, lines 48-50). Furthermore, the statistical evaluation of the window (creation 
of a context vector) is performed to determine the context of the center word (column 6, 
lines 53-62). 

Therefore, Caid et al. would strongly suggest to one of ordinary skill in the art at 
the time of invention that the center of a particular window could be determined based 
on the statistical evaluation for each window. 

21 . In regard to claim 22, Caid et al. does not disclose counting the occurrences of 
particular words and tabulating totals of the counts. 

Hill discloses that a statistical evaluation of a document is based on counting the 
occurrences of particular words and particular documents and tabulating totals of the 
counts (yj is defined as the total number of times a word j occurs in a particular 
document and is used to define a context vector y, column 6, lines 13-14). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Caid et al. so the statistical evaluation used to create context vectors 
counted the occurrences of particular words and particular documents and tabulating 
totals of the counts, as taught by Hill, in order to provide a fast, automatic, and 
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consistent statistically based method of comparing two documents, as taught by Hill 
(column 5, lines 5-7). 

22. In regard to claim 23, Caid et al. does not disclose that the statistical evaluation 
includes a means for determining probabilities of particular words appearing in particular 
documents based on the counts. 

Hill discloses a means for determining the probability (log likelihood) of a 
particular word appearing in a particular document is based on counts of particular 
words and particular documents (column 6, lines 11-28). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Caid et al. so the statistical evaluation determined the probability a 
particular word appeared in a particular document, as taught by Hill, so that an 
appropriate statistical model was used to assess the relevance between two 
documents, as taught by Hill (column 5, lines 8-12). 

23. In regard to claim 24, Caid et al. does not disclose that the statistical evaluation 
includes a means for determining conditional probabilities of particular words appearing 
in particular documents based on the counts. 

Hill discloses that the probability of a particular word appearing in a particular 
document is a conditional probability (log likelihood ratio describes the conditional 
probability of y given x, column 6, lines 1 1-28). 
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It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Caid et al. so the statistical evaluation determined the conditional 
probability a particular word appeared in a particular document, as taught by Hill, so that 
a more specific statistical model describing statistical dependence was used to assess 
the relevance between two documents, as taught by Hill (column 5, lines 8-12). 

24. In regard to claim 25, Caid et al. discloses a means for creating context windows 
around each word further comprises means for selecting the words appearing before 
and after each word by a predetermined amount in the document and including those 
selected words in the window (Figs. 2A-2F and column 6, lines 33-36). 

25. In regard to claim 27, Caid et al. does not disclose determining the likelihood of 
documents having predetermined characteristics based on the statistical evaluation for 
each window. 

Hill discloses a method of determining the relevance of a document based on the 
feature vector of a document. The method will determine the likelihood (log likelihood 
ratio) of a document (first document taken from a database of text documents) having a 
predetermined characteristic (query to be compared to first document, column 6, lines 
1-5 and lines 11-28). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Caid et al. so document vectors based on the combined statistical 
evaluation of each window, as taught by Caid et al., would be used to determine the 
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likelihood of those documents having predetermined characteristics, as taught by Hill, 
so that the measure of relevance between two documents would not require any 
manual relevance feedback, as taught by Hill (column 5, lines 13-16). 

26. Claim 16 is rejected under 35 U.S.C. 103(a) as being unpatentable over Caid et 
al., in view of Dumais et al. (U.S. Patent 6,1 92,360). 

Caid et al. does not disclose the step of evaluating includes determining a 
measure of mutual information. 

Dumais et al. discloses evaluating a given feature in a textual document by 
determining a measure of mutual information (column 12, lines 59-67 and column 13, 
lines 1-54). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Caid et al., so that a measure of mutual information was determined 
in the statistical evaluation of a window, so that windows (features) with the highest 
mutual information values could be kept, while other windows (features) were not 
considered, as taught by Dumais et al. (column 13, lines 47-54), thereby reducing the 
amount of memory needed for the statistical evaluation. 

Conclusion 

The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. Kupiec et al. (U.S. Patent 5,918,240) discloses that a document 
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can be automatically summarized based on a simple Bayes statistical model. Gallant 
(U.S. Patent 5,325,298) discloses a method of creating context vectors for each word in 
a document and combining the vectors to create a document vector. McGreevy et al. 
(U.S. Patent 6,741 ,981) discloses a method of creating context windows around each 
word in a document and determining the relevancy of query based on a statistical 
evaluation of the context windows. Luciw (U.S. Patent 5,434,777) discloses a method 
of creating a context window around every word in a document to determine the 
meaning of that document. Martino et al. (U.S. Patent 5,913,185) discloses a method of 
creating context windows around every word in a document to determine a natural 
language shift in that document. Turtle et al. (U.S. Patent 5,488,725) discloses a 
system of document retrieval based on probabilities calculated from the frequency of 
occurrence of a word in a document. Schuetze (U.S. Patent 5,675,819) discloses 
creating document summaries based on the combination of context vectors for each 
word in a document. 

27. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Brian L Albertalli whose telephone number is (703) 305- 
1817. The examiner can normally be reached on Monday - Friday, 8:30 AM - 5:00 PM. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Talivaldis Smits can be reached on (703) 305-301 1. The fax phone number 
for the organization where this application or proceeding is assigned is 703-872-9306. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). 
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